47 research outputs found
Attribute-Guided Face Generation Using Conditional CycleGAN
We are interested in attribute-guided face generation: given a low-res face
input image, an attribute vector that can be extracted from a high-res image
(attribute image), our new method generates a high-res face image for the
low-res input that satisfies the given attributes. To address this problem, we
condition the CycleGAN and propose conditional CycleGAN, which is designed to
1) handle unpaired training data because the training low/high-res and high-res
attribute images may not necessarily align with each other, and to 2) allow
easy control of the appearance of the generated face via the input attributes.
We demonstrate impressive results on the attribute-guided conditional CycleGAN,
which can synthesize realistic face images with appearance easily controlled by
user-supplied attributes (e.g., gender, makeup, hair color, eyeglasses). Using
the attribute image as identity to produce the corresponding conditional vector
and by incorporating a face verification network, the attribute-guided network
becomes the identity-guided conditional CycleGAN which produces impressive and
interesting results on identity transfer. We demonstrate three applications on
identity-guided conditional CycleGAN: identity-preserving face superresolution,
face swapping, and frontal face generation, which consistently show the
advantage of our new method.Comment: ECCV 201
Latent Embeddings for Collective Activity Recognition
Rather than simply recognizing the action of a person individually,
collective activity recognition aims to find out what a group of people is
acting in a collective scene. Previ- ous state-of-the-art methods using
hand-crafted potentials in conventional graphical model which can only define a
limited range of relations. Thus, the complex structural de- pendencies among
individuals involved in a collective sce- nario cannot be fully modeled. In
this paper, we overcome these limitations by embedding latent variables into
feature space and learning the feature mapping functions in a deep learning
framework. The embeddings of latent variables build a global relation
containing person-group interac- tions and richer contextual information by
jointly modeling broader range of individuals. Besides, we assemble atten- tion
mechanism during embedding for achieving more com- pact representations. We
evaluate our method on three col- lective activity datasets, where we
contribute a much larger dataset in this work. The proposed model has achieved
clearly better performance as compared to the state-of-the- art methods in our
experiments.Comment: 6pages, accepted by IEEE-AVSS201
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
An increasing number of public datasets have shown a marked impact on
automated organ segmentation and tumor detection. However, due to the small
size and partially labeled problem of each dataset, as well as a limited
investigation of diverse types of tumors, the resulting models are often
limited to segmenting specific organs/tumors and ignore the semantics of
anatomical structures, nor can they be extended to novel domains. To address
these issues, we propose the CLIP-Driven Universal Model, which incorporates
text embedding learned from Contrastive Language-Image Pre-training (CLIP) to
segmentation models. This CLIP-based label encoding captures anatomical
relationships, enabling the model to learn a structured feature embedding and
segment 25 organs and 6 types of tumors. The proposed model is developed from
an assembly of 14 datasets, using a total of 3,410 CT scans for training and
then evaluated on 6,162 external CT scans from 3 additional datasets. We rank
first on the Medical Segmentation Decathlon (MSD) public leaderboard and
achieve state-of-the-art results on Beyond The Cranial Vault (BTCV).
Additionally, the Universal Model is computationally more efficient (6x faster)
compared with dataset-specific models, generalized better to CT scans from
varying sites, and shows stronger transfer learning performance on novel tasks.Comment: Rank first in Medical Segmentation Decathlon (MSD) Competitio